162 research outputs found

    The HyperBagGraph DataEdron: An Enriched Browsing Experience of Multimedia Datasets

    Full text link
    Traditional verbatim browsers give back information in a linear way according to a ranking performed by a search engine that may not be optimal for the surfer. The latter may need to assess the pertinence of the information retrieved, particularly when sâ‹…\cdothe wants to explore other facets of a multi-facetted information space. For instance, in a multimedia dataset different facets such as keywords, authors, publication category, organisations and figures can be of interest. The facet simultaneous visualisation can help to gain insights on the information retrieved and call for further searches. Facets are co-occurence networks, modeled by HyperBag-Graphs -- families of multisets -- and are in fact linked not only to the publication itself, but to any chosen reference. These references allow to navigate inside the dataset and perform visual queries. We explore here the case of scientific publications based on Arxiv searches.Comment: Extension of the hypergraph framework shortly presented in arXiv:1809.00164 (possible small overlaps); use the theoretical framework of hb-graphs presented in arXiv:1809.0019

    Can feature information interaction help for information fusion in multimedia problems?

    Get PDF
    This article presents the information-theoretic based feature information interaction, a measure that can describe complex feature dependencies in multivariate settings. According to the theoretical development, feature interactions are more accurate than current, bivariate dependence measures due to their stable and unambiguous definition. In experiments with artificial and real data we compare first the empirical dependency estimates of correlation, mutual information and 3-way feature interaction. Then, we present feature selection and classification experiments that show superior performance of interactions over bivariate dependence measures for the artificial data, for real world data this goal is not achieved ye

    Distributed media indexing based on MPI and MapReduce

    Get PDF
    Web-scale digital assets comprise millions or billions of documents. Due to such increase, sequential algorithms cannot cope with this data, and parallel and distributed computing become the solution of choice. MapReduce is a programming model proposed by Google for scalable data processing. MapReduce is mainly applicable for data intensive algorithms. In contrast, the message passing interface (MPI) is suitable for high performance algorithms. This paper proposes an adapted structure of the MapReduce programming model using MPI for multimedia indexing. Experimental results are done on various multimedia applications to validate our model. The experiments indicate that our proposed model achieves good speedup compared to the original sequential versions, Hadoop and the earlier versions of MapReduce using MPI

    Information-theoretic temporal segmentation of video and applications: multiscale keyframes selection and shot boundaries detection

    Get PDF
    The first step in the analysis of video content is the partitioning of a long video sequence into short homogeneous temporal segments. The homogeneity property ensures that the segments are taken by a single camera and represent a continuous action in time and space. These segments can then be used as atomic temporal components for higher level analysis like browsing, classification, indexing and retrieval. The novelty of our approach is to use color information to partition the video into segments dynamically homogeneous using a criterion inspired by compact coding theory. We perform an information-based segmentation using a Minimum Message Length (MML) criterion and minimization by a Dynamic Programming Algorithm (DPA). We show that our method is efficient and robust to detect all types of transitions in a generic manner. A specific detector for each type of transition of interest therefore becomes unnecessary. We illustrate our technique by two applications: a multiscale keyframe selection and a generic shot boundaries detectio

    Handling temporal heterogeneous data for content-based management of large video collections

    Get PDF
    Video document retrieval is now an active part of the domain of multimedia retrieval. However, unlike for other media, the management of a collection of video documents adds the problem of efficiently handling an overwhelming volume of temporal data. Challenges include balancing efficient content modeling and storage against fast access at various levels. In this paper, we detail the framework we have built to accommodate our developments in content-based multimedia retrieval. We show that not only our framework facilitates the development of processing and indexing algorithms but it also opens the way to several other possibilities such as rapid interface prototyping or retrieval algorithm benchmarking. Here, we discuss our developments in relation to wider contexts such as MPEG-7 and the TREC Video Trac

    Serendipitous Exploration of Large-scale Product Catalogs

    Get PDF
    Abstract-Online shopping has developed to a stage where catalogs have become very large and diverse. Thus, it is a challenge to present relevant items to potential customers within a very few interactions. This is even more so when users have no defined shopping objectives but operate in an opportunistic mindset. This problem is often tackled by recommender systems. However, these systems rely on consistent user interaction patterns to predict items of interest. In contrast, we propose to adapt the classical information retrieval (IR) paradigm for the purpose of accessing catalog items in a context of un-predictable user interaction. Accordingly, we present a novel information access strategy based on the notion of interest rather than relevance. We detail the design of a scalable browsing system including learning capabilities joint with a limited-memory model. Our approach enables locating interesting items within a few steps while not requiring good quality descriptions. Our system allows customer to seamlessly change browsing objectives without having to start explicitly a new session. An evaluation of our approach based on both artificial and real-life datasets demonstrates its efficiency in learning and adaptation. I. MOTIVATION The emergence of online shopping has offered new opportunities to propose services and products to customers. Currently, many online shops are not anymore restricted to a certain category of products. For example Amazon, initially focused on cultural and entertainment media (books, music, and video), is now offering products as diverse as home appliances or jewelry. Even more crucial, we usually find thousands of items within a product category, e.g. 38 million books and 3,5 million jewelry items on Amazon. Both the breadth of product lines and the depth within a product line not only boost the volume of the catalogs but also make it difficult for the customer to find products of interest without an accurate search protocol. Presenting relevant products to potential customers is the goal of recommender systems. Independent of their type (collaborative filtering systems, content-based recommender, etc), recommender systems usually operate on a user profile gained from previous shopping sessions. For this reason, recommender systems suffer from the cold-start problem, when new users and/or new products appear In contrast to the above, our approach does not require the definition of a user profile nor it imposes specific search sessions with pre-defined objectives. In other words, we present an efficient product access strategy enabling intuitive browsing by estimating the user's intention from his/her input to the system and displaying items that are considered as most interesting to him/her (and thus likely to be purchased). Our new information access strategy is based on the notion of current interest rather than on the notion of relevance classically used in Information Retrieval (O1) We accommodate serendipity. We assume no pre-defined (fixed) objective of the user's chain of actions; (O2) The system matches classic (simple) interaction models; (O3) The system is scalable in terms of the volume of the product catalog. Our approach results in an interactive navigation system, which let the user operate naturally over the product catalog while swiftly reacting to changes in the browsing objectives. The major difference with earlier approaches is a rapidly adapting system, that copes with radical changes, and is scalable to operate over realistic-scale product catalogs. The remainder of the paper is structured as follows: in section II, we discuss relevant approaches for information characterisation and content access strategies in large repositories. In section III, we present our interaction model, which describes the type of interaction that is expected from the user and what information is carried over with this interaction. We formalise our navigation model, anticipating functional issues in section IV. In particular, we review its properties ensuring scalability and compatibility with other models. In section V, we propose a comprehensive assessment of the performance of our model in an adaptive browsing scenario. At every browsing step, the system aims at displaying the most useful items to the user with respect to past interaction. Although our study includes an inherent temporal dimension, which makes the evaluation context different from that of classical searc

    Prediction of HIV status based on socio-behavioural characteristics in East and Southern Africa.

    Get PDF
    INTRODUCTION High yield HIV testing strategies are critical to reach epidemic control in high prevalence and low-resource settings such as East and Southern Africa. In this study, we aimed to predict the HIV status of individuals living in Angola, Burundi, Ethiopia, Lesotho, Malawi, Mozambique, Namibia, Rwanda, Zambia and Zimbabwe with the highest precision and sensitivity for different policy targets and constraints based on a minimal set of socio-behavioural characteristics. METHODS We analysed the most recent Demographic and Health Survey from these 10 countries to predict individual's HIV status using four different algorithms (a penalized logistic regression, a generalized additive model, a support vector machine, and a gradient boosting trees). The algorithms were trained and validated on 80% of the data, and tested on the remaining 20%. We compared the predictions based on the F1 score, the harmonic mean of sensitivity and positive predictive value (PPV), and we assessed the generalization of our models by testing them against an independent left-out country. The best performing algorithm was trained on a minimal subset of variables which were identified as the most predictive, and used to 1) identify 95% of people living with HIV (PLHIV) while maximising precision and 2) identify groups of individuals by adjusting the probability threshold of being HIV positive (90% in our scenario) for achieving specific testing strategies. RESULTS Overall 55,151 males and 69,626 females were included in the analysis. The gradient boosting trees algorithm performed best in predicting HIV status with a mean F1 score of 76.8% [95% confidence interval (CI) 76.0%-77.6%] for males (vs [CI 67.8%-70.6%] for SVM) and 78.8% [CI 78.2%-79.4%] for females (vs [CI 73.4%-75.8%] for SVM). Among the ten most predictive variables for each sex, nine were identical: longitude, latitude and, altitude of place of residence, current age, age of most recent partner, total lifetime number of sexual partners, years lived in current place of residence, condom use during last intercourse and, wealth index. Only age at first sex for male (ranked 10th) and Rohrer's index for female (ranked 6th) were not similar for both sexes. Our large-scale scenario, which consisted in identifying 95% of all PLHIV, would have required testing 49.4% of males and 48.1% of females while achieving a precision of 15.4% for males and 22.7% for females. For the second scenario, only 4.6% of males and 6.0% of females would have had to be tested to find 55.7% of all males and 50.5% of all females living with HIV. CONCLUSIONS We trained a gradient boosting trees algorithm to find 95% of PLHIV with a precision twice higher than with general population testing by using only a limited number of socio-behavioural characteristics. We also successfully identified people at high risk of infection who may be offered pre-exposure prophylaxis or voluntary medical male circumcision. These findings can inform the implementation of new high-yield HIV tests and help develop very precise strategies based on low-resource settings constraints
    • …
    corecore